Online Learning Rate Adaptation with Hypergradient Descent

نویسندگان

  • Atilim Gunes Baydin
  • Robert Cornish
  • David Martinez Rubio
  • Mark Schmidt
  • Frank D. Wood
چکیده

We introduce a general method for improving the convergence rate of gradientbased optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this “hypergradient” needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing stable neural identifier based on Lyapunov method

The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...

متن کامل

Online Learning with Adaptive Local Step Sizes

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical indep...

متن کامل

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online sto...

متن کامل

Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World

In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model’s architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippoc...

متن کامل

Online Independent Component Analysis with Local Learning Rate Adaptation

Stochastic meta-descent (SMD) is a new technique for online adaptation of local learning rates in arbitrary twice-differentiable systems. Like matrix momentum it uses full second-order information while retaining O(n) computational complexity by exploiting the efficient computation of Hessian-vector products. Here we apply SMD to independent component analysis, and employ the resulting algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.04782  شماره 

صفحات  -

تاریخ انتشار 2017